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Abstract 



1 Introduction 



The classical channel coding theorem states that the maximum amount of information 
that can be transmitted through a noisy classical communication channel (asymptot- 
ically, per channel use) is equal to the maximum amount of mutual information that 
can be created between the input and the output of the channel, where the mutual in- 
formation is measured as the relative entropy distance of a joint distribution from the 
product of its marginals. The celebrated Holevo-Schumacher- Westmoreland (HSW) 
theorem [151 126] states that the same holds for the classical information carrying 
capacity of a quantum channel (or equivalently, the capacity of a classical-quantum 
channel), for the case of product state inputs and collective measurements on the 
outputs. The capacity is evaluated in the scenario in which the channel is considered 
to be used an asymptotically large number of times and under the condition that the 
probability of error in decoding the output, vanishes asymptotically in the number 
of uses of the channel. Moreover, it is assumed that the channel is memoryless, i.e., 
there is no correlation in the noise of the channel acting on successive input states. 
In real-world applications, however, a channel can only be used finitely many times 
and the assumption of the channel being memoryless is not always justifiable, either. 
Therefore, it is important to evaluate the optimal rate of information transmission for 
a finite number of uses of a channel. 

In this paper we focus on transmission of classical information through a single 
use of a quantum channel, which can itself correspond to a finite number of uses 
of a channel with arbitrarily correlated noise. For a general quantum channel, it is 
not possible to achieve zero probability of error on a single use. So in this case the 
capacity is evaluated under the constraint that the probability of error stays below 
some given threshold e > 0. We hence refer to it as the one-shot e-capacity of the 
classical-quantum channel. In this paper we find bounds on this capacity in terms of 
quantities derived from generalized relative entropies, namely the Hoeffding distance 
and the max-relative entropy. 



Our main result. Theorem |3.2| shows that one can find a lower bound on the 
one-shot capacity of a classical-quantum channel in terms of its Hoeffding capacity, 
which is defined in the same way as the Holevo capacity, but with the relative entropy 
replaced with a Hoeffding distance in its definition. The main idea of the proof is 
a combination of the quantum random coding argument of |T0] and a fundamental 
inequality of hypothesis testing [H, Theorem 1]. It is worth noting that hypothesis 
testing and channel coding are closely related to each other, and hypothesis testing 
results were already used to obtain coding theorems for classical-quantum channels 
e.g. in [HlIiniES]. As an application of these techniques, we also show in Theorem 3.6 a 



lower bound on the exponential capacity of a classical-quantum channel, defined as the 
optimal asymptotic transmission rate under the constraint that the error probabilities 
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vanish with a given exponential speed. 

A geometric interpetation of the asymptotic channel capacity was given in [2^ 
(see also ^ for classical channels), where it was shown that the Holevo capacity of 
a channel is equal to the divergence radius of its image, as measured by the relative 
entropy. In Theorem 41 we show an upper bound on the one-shot capacity of a 
classical-quantum channel in terms of the divergence radius of its image, as measured 
by the max-relative entropy. 

The paper is organized as follows: In Sections |2. 1| and [2?2] we introduce the various 
generalized relative entropies used in the paper, and Section |2.3| is devoted to a brief 
overview of channel coding and various notions of channel capacities. In Sections |3] 
and |4] we prove our lower and upper bounds on the one-shot capacities. To keep the 
presentation reasonably compact, we have moved some of the arguments and examples 
into four separate Appendices. 



2 Preliminaries 

2.1 Renyi relative entropies and related quantities 

For a finite dimensional Hilbert space H, let S{H) denote the set of density operators 
on H, and define 



ijj : S{n) X S{n) xR^R, ip: {p,a,t)^ ipp^^{t) := logTrpV"*. 

(Note that we use the convention logO := — oo and 0* := 0, t G R. By the latter, powers 
of a positive semidefinite operator are defined only on its support; in particular, p° 
stands for the support projection of p.) For density operators p,cr E S{H), their Renyi 
relative entropy of order t G [0, 1) is defined as 

St{p\\o) := ^^p,.(t) = ^logTrpV-*. 

One can easily see that 

c / II N V c( w \ c( w \ /Trp(logp -loga) , supp p < supp a, 
i/^ l+oo, otherwise, 

where S {p\\a) denotes the usual relative entropy of p and a. 

The Hoeffding distances of p and a are obtained from the Renyi relative entropies 

as 

Hr{p\\ a) := sup = sup <St{p\\a)- 

o<t<i i — r o<t<i I i — r 
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for every r > 0. Note that t St{p\\cr) is monotonic increasing [H Lemma 8], and 
hence, HQ{p\\a) = limtyi St {p\ \ a) = S{p\\a). It is also clear from the definition 
that r Hr {p \ \ a) is monotonic decreasing, and one can easily see that 

So{p\\a) = H^{p\\a)<Hrip\\a)<Ho{p\\cr) = Sip\\a), r > 0, (2) 

where Hoc (p || cr) := limj,^oo Hr {p\\cr). Let 

(Pp,a{a) := sup {at- tpp^a{t)}, (a) := sup {ait - 1) - tpp^^it)}, a e R. 

0<t<l 0<t<l 

Note that for fixed p, a G Sili), the function t i— > ipp^aif) is convex on R, and a ^ 
iPp^„{a) is its polar function (or Legendre transform) on the interval [0,1]. For an 
analysis of the properties of these functions, see e.g. [H]. It was also shown in p2] 
that for fixed p and a and each r > — '0p^o^(l), there exists a unique < d'lpp^aiX) 
(the left derivative of ipp^a at 1) such that (pp^aicir) = r, and 

Hr{p\\(r) = ifp^^iar), i.e., {p\\ a) = {ipp^^ o ip-^^) (r), r > -ipp^^{l). (3) 

Note that 1 1-^ 0p,a{t) is strictly monotonically decreasing on the interval (— oo, d~il)p^„{l)], 
and denotes its inverse on this interval. Since both ipp^^j and (pp^a are continuous, 
(§ yields 

lim Hr{p\\a) = H,{p\\a) = S {p\\a) . (4) 

Finally, the Chernoff distance of p, cr G 5 (7i) is defined from the function as 

C{p\\a) :=^p,a(0) = - mm^^p,^(t). 

One can easily see that the Chernoff distance also falls between 5*0 and 5'i, i.e., 

So{p\\(y)<C{p\\a)<S{p\\a). 

The Renyi relative entropies, the Chernoff distance and the Hoeffding distances 
are all non-negative and hence can be considered as generalized distances between 
states (though they are not symmetric in their variables, except for the Chernoff 
distance, and do not satisfy the triangle inequality). The relative entropy and the 
Chernoff distance are also strictly positive, unless the two states are equal. Due to 
Lieb's concavity theorem [T7] and Uhlmann's method [27], all these quantities are 
jointly convex in the variables (p, ex) and monotonic decreasing under stochastic (i.e., 
completely positive and trace-preserving) maps acting simultaneously on p and a (see 
for an alternative proof). Finally, all these quantities emerge naturally as the 
optimal decay rates of certain error probabilities in asymptotic hypothesis testing 
problems; see, e.g. PElElElIIIlliainilllllHlIIllEOlEIlEg. 
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2.2 The max-relative entropy 

Following [8], we define the max-relative entropy of states p and a as 

'S'max (P II 0-) := loginf{A : p < Xa} = inf{7 : p < 2^a}. 

(Note that our notation here differs from that of [H], where the max-relative en- 
tropy of states p and a was denoted by the symbol -Dniax(p| |o"))- One can easily 
see that for commuting p and a with suppp < supper, the max-relative entropy co- 
incides with the Renyi relative entropy of parameter oo, defined as Soo{p\\cr) : = 
lim^^oo jzi log Tr p*o"^~*. The truly quantum case, however, is different, and the max- 



relative entropy turns out to be an independent quantity; see e.g. Example A.l[ In 
the general case, S'max and S^o are related as 

•S'max (P 1 1 0-) < 5*00 (p 1 1 0-) 

whenever suppp < supper [5]. One can see from the definition that 

•S* (p 1 1 cr) < ^max (P 1 1 O") (5) 

for all states p, a. In particular, the max-relative entropy is also strictly positive 
(unless the two states are equal). It also follows easily from the definition that the 
max-relative entropy is monotonic decreasing under arbitrary positive (not necessarily 
stochastic) maps acting simultaneously on p and a. These and other properties of the 
max- relative entropy were discussed in |H]. On the other hand, the max-relative 
entropy is not jointly convex in its variables in general; see e.g. Example |A.2 



The max-relative entropy is also related to the optimal performance in a state 
discrimination problem, as it was shown recently in |T6]. Consider a multiple state 
discrimination problem where the hypotheses pi, . . . , Pm to discriminate are states on 
some Hilbert space H. The optimal average success probability is given as P* : = 
sup|^^ ^^|(1/M) Xlfcii PkEk, where the supremum is taken over all positive oper- 
ator valued measures (POVM) {^i, . . . , Em}, 0<Ek<I, YlLi Ek = I- Theorem 1 
in [I6] yields that 

P* = — inf max 2^--(^^-ll'"). (6) 

M <T(LS{H) l<k<M 

Since our formulation here is slightly different from that of fTU], for readers' conve- 
nience we give a brief sketch of the proof of (|6| in Appendix B 



2.3 Capacities of classical-quantum channels 

By a classical- quantum communication channel (or simply a channel) we mean a triple 
{X,T-C,W)^ where X is a set, H is a Hilbert space and W maps elements of X into 
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density operators on H. If no confusion arises, we will denote the channel simply by 
W. Elements of X are the possible inputs for the channel and ran W is the set of 
the possible outputs, which we will also call the image of the channel. The channel 
is classical if its image is a commutative subset of B{7i). Note that the standard 
definition of a quantum channel is recovered by choosing X to be the state space 
S{Hin) of some Hilbert space Hin and to be a completely positive trace-preserving 
linear map from B{Hin) to B{H). 

In order to use the channel for transmitting (classical) messages, one has to assign a 
codeword to each message, which is an element in the input set X. After the message 
is transmitted through the channel, the receiver has to decide which message was 
sent. If the receiver knows the codewords and how the channel acts on them, then 
his task is to perform state discrimination on the possible outcomes of the channel. 
We say that a triple (M, v?, E) is an M-code if is a function from {1,. . . , M} to 
X (the encoding) and E is a function from {1, . . . ,M} to B(H) (the decoding) such 
that Ek > 0, k = 1, . . . , M, and J^H^ Ek < I. Here, 1, . . . , M are the labels of 
the messages the sender would like to transmit through the channel, y^i, . . . , are 
the codewords, and Ei, . . . ,Em are the POVM operators to discriminate the states 
PFi^j, . . . , W^j^j at the output of the channel. The average error probability of such an 
M-code is 

1 

(M, ^, E) : = - 5^ Tr W^, [I-E,). 

k=l 

For a given e > 0, the one-shot e-capacity of the channel is the maximum number of 
bits that can be transmitted through the channel with error probability at most e: 

Ce{W) := sup{logM : there exists an M-code with Pe{M,ip,E) < e}. 

Here, the base of the logarithm is chosen to be 2. Note that one could also define the e- 
capacity using the maximum error probability Pe,max(M, v^, E) := maxi<fc<M Tr W^,, (/— 
Ek) instead. This capacity Ce,nia.x(}V) is related to Ce(W) as Ce/2{W)-1 < Ce,ma.^(}V) < 
Cs{W), where the second inequality is obvious and the first one follows by "throwing 
away the worst half of the codewords" (see e.g. [SI p. 204]). 

Consider now the nth product extension of the channel W, defined as 

Note that if is a quantum channel with X = S{H\n) then 

iy(")(pi, . . . , p„) = W^""" (Pl ® • • • ® Pn) , Pi, . . . , Pn G 5(7^i,). 

Hence, this formulation only allows product encoding, while entangled measurement 
is allowed in the decoding. The asymptotic e-capacity of W is defined as 

C,{W) := sup I liminf-logM^'^) : limsup Pe(M("\ (^("), E^")) < , 
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where the supremum is taken over sequences of codes (M("),(^("),E(")), satisfying the 
indicated criterion. One can easily see that 

liminf -C,(iy(")) < Ce{W) < Ce'{W) < liminf ^C,„(iy(")) (7) 

n n n n 

for any < e < e' < e" . One can also define a stronger notion of asymptotic capacity 
by requiring that the error probabilities vanish with a given exponential speed: 

:= sup (liminf -log M(") : lim sup - log Pe(M("), ^('^^ E^")) < -r 1 . 

Obviously, r i— > C^^^ is monotonic decreasing, and C^^^ < Cg^^ < C'o < for any 
< r,£. 

Let Mf{X) denote the set of finitely supported probability measures on X, and 
define /C := the L^-space on X with respect to the counting measure. For 

each X & X, define the rank-one projection 5^ '■= where l^^} is the 

characteristic function of the one-point set {x}. For a finitely supported probability 
measure p on X, let 

Rp ■■= ^Vx5x ® VTx, Qp ■■= I ^Vx5x j ® Ep{W), (8) 

where Ep{W) := ^^PxW^. Obviously, Rp and are density operators on /C (g) 7-^, 
and is the product of the marginals of Rp. Hence, S {Rp \ \ Qp) is the mutual 
information in the bipartite classical-quantum state Rp, defined as its distance from 
the product of its marginals, where the distance is measured by the relative entropy. 
The Holevo-Schumacher- Westmoreland theorem [151 ESj states that 

Co{W) = X*m-= sup S{Rp\\Qp). 

P&Mf{X) 

The quantity x*(W) is called the Holevo capacity of W . 



3 HoefFding capacities and lower bounds 

It is a natural idea to measure the amount of correlations in a bipartite state as its 
distance from the product of its marginals, and the channel coding theorem selects 
the relative entropy as the right measure of distance. One may, however, define the 
amount of correlations using some generalized relative entropy D{. \ \ .), and define the 
corresponding version of the Holevo capacity as X^(Vr) := ^^VpaMfix) D{Rp \ \ Qp). In 
particular, for a channel W we define its Hoeffding capacity with parameter r > as 

x:m--=X*AW)--= sup H,{Rp\\Qp), 

peMfix) 
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where Rp and Qp are as in ([8j). Note that if D is the relative entropy then for any 
S{Rp\\ Qp) = Y,PxS {W^\\Ep{W)) = S{Ep{W)) - Y,P-S{W,), 

X X 

where S{p) := —S{p\\I) is the von Neumann entropy of a state p. These identities 
are specific to the relative entropy and do not hold for a general D. However, if D is 
jointly convex in its variables and invariant under adding an ancilla then 

D{Rp\\Qp) < Y^PxD (4 ®W,\\5^® Ep{W)) = J^P^D {W,\\Ep{W)) . 

X X 

This holds, for instance, for the Renyi relative entropies with parameter t G [0, 1], the 
Hoeffding distances and the Chernoff distance. 

Our main goal in this section is to give lower bounds on the one-shot capacity and 
the exponential capacities of a classical-quantum channel in terms of its Hoeffding 
capacity. We will make use of the following lemma, which is essentially the same as 



inequality (11) in [9j. For readers' convenience, we give a detailed proof in Appendix 

o 

Lemma 3.1. For any M G N, any c > and any p finitely supported probability 
distribution on X, there exists an M-code (M, E) such that 

Pe{M,^,E) <{l + cy{2 + c+l/cy-\M ~1Y-'TtRIQI~\ < t < 1. □ 

Theorem 3.2. For any channel W, the one-shot e-capacity is lower bounded as 

Ce{W) > XI^^._^^{W) - log (^l±^±lt^ (9) 

for any c > 0. 



Proof. By Lemma 3.1 



C,(l^)>logM (10) 
for any M such that there exist a p G Mf{X)^ a c > and a t G [0, 1) such that 

(1 + c)*(2 + c + l/c)^-*(M - Tr i^^gj"* < e. 
Rewriting this condition, we get 

log(M - 1) < ^ (t \og{e/{l + c)) + (1 - t) log (£/(2 + c + 1/c)) - log Tr i?* gj-*) 

<r 1 rro^ -tlog((l + c)/5)-logTri?^g;,-^ 

< - log ((2 + c + l/c)/e) + sup 

o<t<i 1 — t 

= - log((2 + c+ l/c)/£:) + i^iog((i+c)/£) (RpWQp) , 
from which the statement follows. □ 
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3.2 



goes to — oo as 



Remark 3.3. The term X*^^^i+c^{W) - log Theorem 

e tends to and hence the lower bound in Theorem 13.21 becomes trivial below some 
threshold Eq. This comes of course as no surprise: even if the asymptotic capacity of 
the channel is non-zero, it might not be possible to transmit more than one message 
with arbitrarily small error probability in one single use of the channel; see, e.g.. 
Example |D.2[ 

Note that the Hoeffding distance is monotonically decreasing in its parameter and 
hence, with the choice c = 1 in Theorem 3.2, we get 

C,iW)>xl^^^jW)-\ogiA/e)>x 



log(4/e) 



W-log(4/£). 



fill 



This lower bound is strictly positive if and only if there exists ap E Mf{X) for which 
HiogWe) (RpWQp) - log(4/£) > 0. 

Note that suppi?p < suppQp and hence ^ (1) = 0, and ^ imphes that for any 
r > 0, 



p^Qp 



Hr {Rp \\Qp)-r = ip^^Q^ (a. 
Note that a,- = is equivalent to 

the Chernoff information in the classical-quantum state R„. Since = ^ (r), and 

■'"^ Rp,Qp 

ifj^^ is monotonically decreasing, we finally get that 



Hence, the lower bound in (11) is strictly positive if and only if 

log(4/e) < x*c{W), or equivalently, < e, 

where Xc(^) ^^'9p<aMf{x) ^ (Rp II Qp) the Chernoff capacity of the channel. □ 



Remark 3.4. Example D.2 shows that for any e G [0,1/2) and any K > there 
exists a channel W such that CgCW) = while x*0^) > K. This shows that there 
exists no function / : IR+ ^ IR+ for which Ce{W) > x*(^) ~ /(^) would hold for 
every channel. Hence, we cannot have a lower bound similar to ^ with x*i^) iii 
place of the Hoeffding capacity. □ 



Theorem 3.2 yields immediately the following: 
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Corollary 3.5. For any channel W, any c > 0, any e > and any n e N, the capacity 
per channel use for n uses of the channel is lower bounded as 

iaar<"))>x-;...„,^^„.,(W)-ilog((2 + c+l/c)A). 

Proof. Note that for any p G Mf{X) and any n G N, 



Rp", Qp^" — Qp"" and Hr {RpSm || Qp^n) — I^Ht/u {Rp II Qf 



for any r > 0. By Theorem 3.2 



n 



log((l+c)/£) 

Hi 



(yRp®n I I 



log((l+c)/e) 

from which the statement follows. 



(-Rp 1 1 Qp 



-log((2 + c+l/c)/£) 
log((2 + c+l/c)/e) 
n 



□ 



Theorem |3.2| only provides a lower bound on the one-shot 5-capacity of a channel. 
However, it is asymptotically sharp in the sense that the lower bound of the HSW 
theorem can be recovered from it. Indeed, by Corollary 3.5, the first inequality in ([t]), 
and by Q, 



CAW) > liminf-C,(iy( 

n n 



> lim 



Hi log((l+c)/£) 



(-Rp 1 1 Qp) 



log((2 + c+l/c)/£) 



n 



Ho (Rp 1 1 Qp) = S {Rp 1 1 Q 



pj ) 



0, 



from which Ce{W) > x*(^) follows for all e > 0. Considering a sequence e„ 
one also gets Co{W) > x*iW). 

It is known that for rates below the capacity x*(^)} one can find a sequence of 
codes for which the error probabilities vanish with an exponential speed, and hence 
Cq{W) > x*{W). One can use Lemma 3.1 to give a lower bound on the exponential 



capacities, that yields the above lower bound as a special case: 
Theorem 3.6. For any channel W and r > 0, 



CnW) > xliW) ~ r. 

Proof. The statement is trivial if xli^) ~ r < 0, hence for the rest we assume it to 
be strictly positive. Let < R < x*(W^) ~ ^- By definition, there exists a p G M.f{X) 
such that R < {Rp \ \ Qp) — r. By the definition of the Hoeffding distances, there 
exists a t G [0, 1) such that 



R < 



tr 
1-t 



+ St {Rp II Qp 



1 

1 - 1 
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(-r-logTri?*g(i-*)) 



or equivalently, 



Now, for each n G N we can apply Lemma 3.1 with the channel being W^"\ the 
probability distribution p®** and M*^") := [2"^J, and get the existence of an M'^^^-code 
such that 

Since Rp(gm = and Qp»n = (Qp)'^"', we finally get 

Pe(M("),(^("),E(")) <4(2(^-*)^Tri?*Qj-*)" <4-2-"^ 
from which the assertion follows. □ 



Remark 3.7. As we have seen in Remark 3.3, the lower bound — is strictly 

positive if and only if r < x*c(^)- 

4 Divergence radii and an upper bound 

The divergence radius of a subset S C S(Ti.) with respect to some generalized relative 
entropy D is defined as 

RoiX') '■= inf sup{D(p 1 1 0")}. 

a-es{n) pes 

In particular, we denote by -Rmax(S) := Rs^^^{T,) the max-relative entropy radius of 
S. We have the following: 

Theorem 4.1. For any channel W and e > 0, 

Ce{W) < P^ax(ranl^) - log(l - e). 
Proof. Let {M,<^,E) be an M-code for which Pe{M,<^,E) < e. By Q, 
PJM,ip,E) > 1- inf max — 2^'"^^(^^fell'') 

M 

which yields 

logM < - log(l -e) + RmUiW^J) < - log(l - ^) + ^max(ran W), 
from which the statement follows. □ 
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Remark 4.2. An alternative proof of the above Theorem can be obtained using 
Lemma 4 in p^, which states that for any code {M^(p,E)^ any state a G ranVT and 
any 7 G R, 



1 ^ 

Pe{M,^,E) > - Y^T,W^^{T'a-W^, > 0} ^. 



27 



k=l 



Choosing therefore 7 := i?max(i'an ly) and a to be a state where the infimum in the 
definition of i?max(raniy) is attained, one obtains 

Pe{M,<^,E) > 1 - 2^-^''(ranVr)/M. 

Therefore, Pe{M, ip,E) <e yields logM < R^^^{ia.nW) - log(l - e). 

Remark 4.3. The additivity of the max-relative entro py o n product states yields that 
-Rmax (ranVr'^")) < nRrasoiiW) and hence, by Theorem 



4.1 



Ce{W) < liminf-C,,(iy(")) < liminf-i?max(raniy(")) < R^^^{i&nW) 



n n n n 

for any e < e' < 1. This upper bound, however, is not optimal in general, as Example 
ID.2I shows. 

Remark 4.4. As noted before, Rj:,{iaiiW) = x^(W^) when D is the relative entropy. 
For a general D, such an identity does not hold. However, when D is the max-relative 
entropy, S'^^x {Rp WQp) = max^^s^ppp S^ax {W^ \ \ Ep{W)) yields 

i?max(ran W^) = sup inf sup {5'max (W^x || cr)} 

p&Mf{X) <^<^S{H) xesuppp 

< sup max {^^ax(W^.ll^pW)} 

= sup ^max {Rp \\Qp)= Xi^W) . 
p&Mf{X) 

5 Conclusion 

We have shown lower bounds on the one-shot capacities and the exponential capacities 
of a classical-quantum channel in terms of its Hoeffding capacity, and an upper bound 
in terms of the max-relative entropy radius of its image. While the lower bounds on 
the one-shot capacities were shown to be asymptotically tight, the same is not known 



for the upper bounds of Theorem |4.1[ It is an open question whether a sensible upper 
bound can also be found in terms of the Hoeffding capacities. To the best of our 
knowledge, our lower bound is a new result even for classical channels. 
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The exponential capacities considered in this paper are in some sense dual to the 
well-known notion of the error exponent in channel coding theory. The latter is defined 
as the optimal exponential decay rate of the error probabilities for sequences of codes 
with a fixed transmission rate. An upper bound on the error exponent was given in 
inequality (11) in [9], and one can easily verify that the lower bound in our Theorem 



3]6]can actually be derived from that. 

In Stein's lemma of hypothesis testing, one is interested in the asymptotic be- 
haviour of the quantities (l/n)/5e(p„ 1 1 cr„) for two sequences of states {pnjneN and 
{o"„}„gN and some e G (0,1), where (3e{pn\\o'n) '■= inf{logTrcr„74 ■ < A < 
I, Tt pn{I — A) < s} is the logarithm of the optimal error probability of the sec- 
ond kind under the constraint that the error probability of the first kind stays below 
e. When p„ and Un are the nth i.i.d. extensions of the states pi and cJi, respectively, 
then lim„^oo(l/''^)/3£(pn II o"n) = —S{pi\\ai) for any e G (0,1) [I3l [22]. This result 
provides an operational interpretation of the relative entropy, and was used in [23] to 
give an alternative proof for the achievability part of the HSW theorem, namely that 
Co(W) > suppg^^^;^.-) S {Rp II Qp). Recently, upper and lower bounds on the one-shot 
capacities of classical f28] and classical-quantum channels [22] were obtained in terms 
of the quantities (3s{Rp\\Qp)- These results refine the connection between channel 
coding and hypothesis testing by establishing a connection between the operational 
quantities of the two theories. At the moment it is not clear how the lower bounds of 
and our lower bound are related to each other. 
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Appendix A 



Example A,l, Let < a < 1/2 and define density operators 
1 



P ■-- 



1 1 
1 1 



and a 



a 
1-a 



on 7i := C^. One can easily see that Aa — p > if and only if A > 2a{i-a) ■> ^"^^ hence 



'S'max (p 1 1 (t) = log 



2a(l - a) 

On the other hand, a straightforward computation yields 
S'oo (p 1 1 cr) = — log min{a, 1 — a} = log - . 

(X 

By assumption, 2(1 — a) > 1, and hence, 

•S'max (p 1 1 0-) < Soo (p 1 1 0-) . 

Example A. 2, Let p^, (jfe, = 1, . . . , r be density operators on a Hilbert space H 
such that supppfc < supper^ for all k. Let = 1, . . . , r, be a set of orthogonal 

rank-one projections in some auxiliary Hilbert space /C, and let pi, . . . ,pr be strictly 
positive convex weights. Then, 

5'max I ^ Pkh (S) Pfc 1 1 ^ PkSk ® CTfc I = maX Smax (Pfc 1 1 ^ ^ Pfc^'max (Pfc 1 1 CTfc) 

\ k k / k 

unless S'max (Pfc 1 1 CTfe) is the same for all k. 



Appendix B 

Let l{i}, . . . , be the standard basis of C^, and define p := X]fc=i(l/^)|l{fc})(l{fe}l® 
Pfc. The optimal success probability can be expressed as 

r 

P* = sup Trp (g)Efc = sup TrpE, 

{E-i_,...,E„,} j,^^ 0<E&B{C^'(^n), 

Tt^m E=In 

where the first supremum is taken over all POVM {Ei, . . . ,Em}- Using the duality 
theorem of linear programming, it was shown in [T6l Lemma 4] (see also formula IIL15 
in ISO]), that the right-hand side of the above formula is equal to 

inf{Tr 5 : 5 > 0, p < /cm ® 5} = inf{Tr B : B >0, pk < MB, k = l,...,M} 

= ^mf{TiB : B>0, pk<B,k = l,...,M}. 
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Replacing the infimum over B with infima over a := {l/TrB)B, and A := TtB, we 
finally obtain ([6]). 



Appendix C 

For the proof of Lemma |3.1[ we need the following Lemma, which is essentially a 
restatement of inequality (44) in (TU]. For readers' convenience, we give a detailed 
proof here. 

Lemma C.l. For any M G N, any c > 0, any p finitely supported probability 
distribution on X and any n : suppp — > B{T-C) such that < 7r(a;) < I, x E suppp, 
there exists an M-code (M, E) such that 

Pe{M,ip,E) < (l+c)^p,.TrH^,(/-7r(x)) + (2+c+l/c)(M-l) J]p,Tr£;p(iy)7r(x). 

X X 

(12) 

Proof. For each x E X^^, define an M-code {M^ (p{x)^ E{x)) by 

ipk{x) := Xk, Ekix) := [Akix) + 5fc(x)]"5 A^ix) [Akix) + Bkix)]'^ , = 1, . . . , M, 
where 

Ak{x) := Tc{xk), Bk{x):=y^ 7r{xi), k = l,...,M. 

Lemma 2 in dU] tells that I - {A + B)-l A{A + B)-^2 < (i + c)(J - A) + (2 + c + l/c)5 
for any operators 0<y4<J, 0<-Bon some Hilbert space H. and any number c > 0. 
Applying it to A = Ak{x) and B = Bk{x), we get the bound 

/ - Ek{x) < (1 + c)(/ - 7r{xk)) + (2 + c + l/c)Bk{x), c > 0, 

by which we get the following upper bound on the average error probability: 

1 

Pe(x) := P,{M,v{x),E{x)) = -Y.^TW,,{I-Ek{x)) 

k=l 

k=l k=l 

Note that for each k, x\-^ W^,. and x ^ B^^x) are independent random variables on 
X^^ with respect to any product measure on X^ . Hence, taking the expectation value 
of both sides of the inequality in (fTsj) with respect to the product measure yields 



that Ep(SMPe is upper bounded by the right-hand side of (12). Therefore, there has to 
exist at least one x G (suppp) for which Pe{x) is upper bounded by the right-hand 



side of (12), from which the assertion follows. □ 
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Proof of Lemma 3.1 For a function tt : suppp — > B{7{) such that < 7r(x) < 
I, X E suppp, define 11 := J2xex^^ ® vr(x). With this notation, the upper bound in 
(12) can be rewritten as 

TtA{I-U) + TtBU, A:={l + c)Rp, B := {2 + c + l/c){M - l)Qp, (U) 

which is minimized over all possible choices of 11 at the Holevo-Helstrdm test 

{A - B > = ^5^® {{I + c)W^ - (2 + c + l/c)(M - l)Ep{W) > 0}. 



X 



(Here we use the notation {X > 0} to denote the spectral projection corresponding 
to the positive part of the spectrum of a self-adjoint operator X.) Choosing therefore 
7r{x) := {(1 + c)W^ - (2 + c + l/c)(M - l)Ep{W) > 0} in Lemma [Clj we get the 
existence of an M-code for which the average error probability is upper bounded by 
the value of (|T4|) at H*, which is easily seen to be ^ Tt{A + B) - \ Tt \A - B\. By 
Theorem 1 in |Tj, 

^ Tt{A + B) -^Tt\A- B\ < Tr 

for any two positive semidefinite operators A and B on some Hilbert space H and any 
t G [0, 1]. Applying it to the above choice of A and 5, we finally get the assertion of 
the Lemma. □ 



Appendix D 

Example D,l, (Classical state discrimination) 

Let pi, . . . , Pm be commuting states on a Hilbert space H. Then, there exists a basis 
ei, . . . , Crf of in which all the states are diagonal and hence they can be represented as 
functions on A" := {1, . . . , d} in an obvious way. Let S{A) := Ylk=i{^k, Aek)\ek){ek\, A e 
B{H). If we use the POVM Ei, . . . , Em to discriminate the states then the correspond- 
ing succes probability is 

i=l 1=1 

and hence we can assume without loss of generality that the POVM operators are also 
functions on X. The succes probability is then 

M d d M d M 

1=1 k=l k=l i=l k=l i=l 

k=l i=l k=l 
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where m{k) := maxi<j<A/ pi{k), k = 1, . . . , d. We say that E'l, . . . , Em is a maximum 
likelihood measurement if Ei = I and Ei{k) = when pi{k) ^ m{k). It is easy 
to see that all the above inequalities hold with equality for any maximum likelihood 
measurement. Therefore, the optimal succes probability of discriminating the states 
pi, . . . ,Pm is 

d 

I . 



fc=l 

and hence, by (|6]), the max-relative entropy radius of {pi, . . . , pa/} is 

d 

-Rmax ({Pl, • • • ,PAf}) = log^m(/c). (15) 

fc=l 

Example D.2. (Capacities of the classical depolarizing channel) 

We define the classical depolarizing channel with parameters G N and a G (0, 1) 
the following way. Let A* := {1, . . . , c?} and H. := and let 5^ := where 
l{i}, . . . , is the standard basis of C'^. The action of the channel is 

: fc^a5fe + (l-a)^/, \<k<d. 

Note that the outputs of the channel are diagonal in the standard basis of and 
hence we will identify them with functions on X in an obvious way. Consider an 
M-code (M, yj, E) and define 

raik) := max W^f-"\k) = — h a max S^(i)(k), 1 < k < d. 

^ ' i<i<M "^W ^ ^ d i<i<M ^^'^ - - 

Note that 

d d d M 

ra{k) = 1 — a + a max 6^^i) (k) < 1 — a + a (k) 

k=l k=l ~*~ k=l i=l 

M d 



1 — a + a 5^p(i') (k) = 1 — a + aM. 



i=l k=l 



Hence, by Example D.l the success probability of any code is upper bounded as 



Ps{M,ip,E) < + 

If £ G [0, 1/2) then there exists an a G (0, 1) such that 1 — e > + a and hence, 
by the above bound, there exists no code (M, (p, E) with M > 1 and Pe{M, (p, E) < e. 
Therefore, Q (VT^'^'")) = 0. 
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On the other hand, the Holevo capacity is 



x*{w^''-^) = sup \s{J2pM''''^]-T.p>^s(w, 

sup [sij^p.wl'A] 



s[wi 



p(^Mf(X) 

where we have used that w'jf'"'^ is a permutation of w['^'°''^ and hence their entropies 
are equal. The first term is clearly upper bounded by S{{l/d)I) = logd, which 
can actually be reached by choosing p to be the uniform distribution on {1, . . . ,d}. 
Therefore, 

X* {W^'''''^)=\ogd-s(wl'''''^^ , 

and a straightforward computation yields 

r {W^'''^^) = ^ + log {l + id- l)a) + ^(1 - a) log(l - a), 

which scales as logc? as d tends to infinity, for any parameter a G (0, 1). 



Finally, by (15), the max-relative entropy radius of ran 14^ is given by 

d 

l<i<d 



i?^ax (raniy^'") = log ^ max W-'''{k) = log(l + {d - l)a). 



k=l 

Note that 

Since the Holevo capacity is equal to the relative entropy radius R (ran W^'°') by 
this shows that 

i?„,ax (raniy^'") > R (raniy^'") . 
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